Breath-Detection-Based Telephony Speech Phrasing
نویسندگان
چکیده
ASR has long attracted attention for call center monitoring systems. In the ASR technology for call center conversations, the system usually divides an input signal into separate utterances and eliminates the unneeded silence parts of the signal before doing ASR processing on the detected utterances. This means the input signal should be split into utterances of the proper length for both ASR performance and readability. However, typical VAD techniques sometimes generate overly long speech segments because they are focused only on the length of the pause (non-speech) between sentences. In contrast, it is shown that speakers typically take breaths for when speaking more than one sentence or long sentences. These breaths are highly correlated with the major prosodic breaks. In this paper, we focus on the breath events in the pause intervals and attempt to split the input signal into utterances by detecting the breathing events. The proposed method leverages acoustic information that is specialized for breathing sounds, which led to a two-step approach to detect the breath events with an accuracy of 97.4%. Also, the proper speech phrasing based on breath events improved word error rate in ASR.
منابع مشابه
Style-Specific Phrasing in Speech Synthesis
People pause between words and sentences when they speak. They pause to emphasize content, or to make an utterance more understandable, or just to take a breath. A speech synthesizer should also insert similar pauses to sound natural. The process of inserting prosodic breaks in an utterance is called Phrasing. Phrasing is a crucial step during speech synthesis because other models of prosody de...
متن کاملIntonation patterns in older children with cerebral palsy before and after speech intervention.
PURPOSE This paper examined the production of intonation patterns in children with developmental dysarthria associated with cerebral palsy (CP) prior to and after speech intervention focussing on respiration and phonation. The study further sought to establish whether intonation performance might be related to changes in speech intelligibility. METHOD Intonation patterns were examined using c...
متن کاملSpeech recognition with automatic punctuation
We present a method of speech recognition with automatic punctuation based on a combination of acoustic and lexical evidence. In the recognizer vocabulary, punctuation marks are treated as word entries. By assigning the acoustic baseforms of silence, breath, and other non-speech sounds to punctuation marks, and using a properly processed N-gram language model, unpronounced punctuation marks of ...
متن کاملSeveral Aspects of Machine-Driven Phrasing in Text-to-Speech Systems
The article discusses differences between a priori and a posteriori phrasing and their importance in the task of automatic prosodic phrasing in text-to-speech systems. On several examples it illustrates shortcomings of common evaluation of a priori phrasing performance using a posteriori phrasing of referential corpus data. The paper also proposes and evaluates a method for a priori phrasing ba...
متن کاملAcoustic Cues for Automatic Determination of Phrasing
This paper proposes a framework of automatic determination of phrasing using acoustic features derived from the speech signal. The feature vectors were defined in a series of analyses investigating the acoustic-phonetic realization of minor and major phrase boundaries and different boundary types. The resulting representation was used to train statistical classifiers to automatically determine ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011